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REMAPPING I/O DEVICE ADDRESSES INTO HIGH MEMORY 
USING GART 

5 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The invention pertains generally to computer systems. In particular, it 
10 pertains to address mapping in computer systems. 

2. Description of the Related Art 

Computer memory is usually addressed either directly or through the use of 
mapping. Direct addressing involves specifying a memory address by placing the 

15 address into a register. The address contained in that register is then directly applied 
to the addressing bits on a memory bus. Since a register has a predetermined number 
of bits, the address range that can be specified in the register is limited to the range 
that can be specified with that number of bits. Many modern computer systems, such 
as system 10 of Fig. 1, use 32-bit registers and address buses, permitting them to 

20 directly address up to 4 gigabytes (GB) of memory. Since register width and memory 
address width are usually the same, software programs and their associated data are 
also generally limited to a 4 GB address space. Fig. 1 shows input-output (I/O) 
controller 1 1 with an internal 32-bit address bus for controlling transfers between the 
various attached devices. For simplicity, only the number of address lines are 

25 marked in the figures. As a person of ordinary skill in the art will readily recognize, 

1 
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the address lines will be accompanied by data lines and control lines as well. The 
exact number and configuration of these lines will depend on the particular bus 
standards being followed. 

To reach more memory than is directly addressable by the contents of a 
register, two approaches are commonly used. In the two-stage approach, the standard 
address register provides some of the bits, while a separate register provides 
additional bits to extend the addressing range. For example, the separate register 
specifies one of several 4 GB blocks, while the standard 32-bit register specifies an 
address within that 4 GB block. Thus, a separate 4-bit register could specify one of 
sixteen blocks, for a total addressable space of 64 GB. Since most programs and 
their associated data will fit into a 4 GB memory space, the contents of the separate 
register do not need to be changed frequently, and the selected 4 GB block of 
memory can remain selected for a reasonable time. Fig. 1 shows the 4 additional 
address bits gomg from memory map 13 to memory controller 15 for a total of 36 
address bits to memory 14. 

Alternately, an equivalent function can be performed in the CPU, which then 
outputs the 36 address bits directly. In this configuration, memory map 13 or its 
functional equivalent is internal to CPU 12 rather than I/O controller 11. 

In a similar but unrelated mapping effort, graphics controllers have 
conventionally provided 32-bit direct addressing of a contiguous 4 GB address space. 
However, the memory to be addressed is physically located in main memory, which 
is allocated to the graphics application in small blocks on an as-available basis. Thus 
the memory allocated for the graphics application at any given tune; while addressed 
by the graphics controller as a range of contiguous virmal addresses, is actoally 
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provided as a disjointed set of smaller blocks of physical addresses, which may not 
even be in the same order. To correlate the virtual addresses to the physical 
addresses, a mapping table is provided, which translates each page (or other 
predetermined block size) of virmal memory into the physical page of memory 
allocated to it. Fig. 1 shows a graphics address redirection table (GART) 17 for 
translating 32-bit addresses between graphics controller 16 and memory controller 15. 

Although such mappmg techniques have been applied to main memory and 
graphics, standard I/O buses and their attached peripherals have generally not 
benefited from such address mapping techniques. Most standard I/O buses, such as 
peripheral component mterconnect (PCI) bus 18, are Ihnited to 32 or fewer address 
bits, and therefore cannot directly address more than 4 GB of memory. Since they 
frequently transfer data directly between the peripherals and main memory, this limits 
these transfers to the lower 4 GB of main memory, while the programs that use the 
data may be located in higher 4 GB sections of memory and therefore be unable to 
directly reach the data. The conventional approach to this problem is to transfer the 
data to/from the lower 4 GB memory space through bus controller 19 over the 
internal bus of I/O controller 11, and use software to transfer the data between the 
lower 4 GB and the 4 GB section of memory 14 that the application program is 
located m. This process is very slow and places an unreasonable burden on the 
processor and main memory bus, since it requires three accesses to memory rather 
than one: 1) write the data to a temporary buffer, 2) read the data from the 
temporary buffer, and 3) write the data to a permanent buffer. In a system that is 
already limited by memory bandwidth, this can cause an unacceptable degradation in 
performance. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



Fig. 1 shows a system of the prior art. 
Fig. 2 shows a system of the invention. 
5 Fig. 3 shows an address translator of the invention. 

Fig. 4 shows a conceptual chart of a mapping scheme of the invention. 



DETAILED DESCRIPTION OF THE INVENTION 

10 

The invention implements address mapping between an I/O bus interface and 
main memory that expands the directly addressable range of the I/O bus, while not 
requiring a separate mapping circuit to implement it. An embodiment of the 
invention takes advantage of an existing mapping function that is used in a known 

15 graphics controller, and enhances it for this use. 

Fig. 2 shows a system 20 of the invention. I/O controller 21 can control data 
transfers between CPU 22, memory 24, graphics controller 26, and bus 28, using 
memory map 23, memory controller 25, GART 27, and bus controller 29, 
respectively. This system differs from the prior art system of Fig. 1. Although 

20 memory map 23 still provides an enhanced address range to memory controller 25 
using the additional address bits (a total of 36 address bits in the illustrated 
embodiment), GART 27 can also provide the additional address bits, and bus 
controller 29 can now be coupled to GART 27 instead of being coupled directly to the 
internal bus as it was in Fig. 1. The expansion of GART 27 to 36 address bits 
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permits GART 27 to directly address up to 64 GB of memory. However, since bus 

controller 29 is still limited to 32 address bits, it camiot make immediate use of this 

expanded address capability. By coupling bus controller 29 to GART 27, and 

modifying the expanded GART to accept an interface to a device other than graphics 

5 controller 26, bus controller 29 can be permitted to access memory outside the 
normal 4 GB range that the bus controller is normally limited to. Thus, devices on 
PCI bus 28 can transfer data directly to any part of the full memory range of 64 GB, 
without an intermediate transfer step in the software. 

In one embodiment of the invention, memory map 23 can be a part of I/O 

10 controller 21, disposed in Fig. 2 between CPU 22 and all other devices interfaced 
through I/O controller 21. As previously described for Fig. 1, the additional address 
bits produced by memory map 23 can also be produced directly by CPU 22, using an 
equivalent mapping function or some other method. Throughout this description, any 
reference to memory map can also be applied to a CPU that directly outputs all the 

15 necessary address bits. 

Fig. 3 shows a more detailed view of a GART 27 of the invention. GART 27 
includes address translator 39 and translation control circuit 38. Translator 39 can 
receive a 32-bit address from graphics controller 26, and can also receive a 32-bit 
address from bus controller 29. In one embodiment these are two separate interfaces. 

20 After the address translation takes place, translator 39 can provide a 36-bit address to 
memory map 23 and memory controller 25. In one embodiment, this is a common 
bus interface to both devices. 

The translation of one address to another can be programmable, so translation 
control circuit 38 can receive instructions from CPU 22 on how to program the 
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translation tables, and then place the proper data into translator 39. These 
instructions can be received over the common bus shared by GART 27, memory map 
23 and memory controller 25. Since translation control circuit 38 is an addressable 
device itself, it typically has an address that is within the standard peripheral address 
range, and does not need the additional 4 address bits. The portion of the bus 
connected to translation control circuit 38 is therefore shown as having only the 
standard 32 address bits, while the connected portions of the same bus are shown in 
Fig. 3 as "32 + 4" to indicate they have the basic 32 address bits shared with circuit 
38, plus the extra 4 address bits used for the expanded address range. 

Fig. 4 shows a conceptual flow diagram of the operation of translator 39. 
When a 32-bit address from bus controller 29 is received by input register 41, the 
address has an upper portion Ul and a lower portion LI. In one embodiment, upper 
portion Ul contains 20 address bits that will be translated, while lower portion LI 
contains 12 address bits that will remain unchanged, so that memory can be translated 
in blocks of 4 kilobytes (KB). Other block sizes can also be chosen. Once the address 
is received in register 41, the upper portion Ul can be compared with the contents of 
a table 43, which can be configured as a graphics translation lookaside buffer 
(GTLB). Table 43 can also be thought of as a content-addressable memory (CAM), 
because upper address portion Ul can be compared against all entries U11-U16 to see 
if it matches the contents of any of those entries. If a match is found, the 
corresponding entry U21 -U26 can then be delivered to the upper bits of output 
register 44, where it provides the upper portion U2 of the translated address. This 
can be merged with the original lower portion LI to form the complete translated 
address in output register 44. The number of bits in entries U21-U26 can be 
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independent of the number of bits in entries Ul 1-U16. Following the previous 

examples, Ul would contain 20 bits, while U2 would contain 24 bits, and LI would 

remain constant at 12 bits, resulting in a 32-bit to 36-bit address translation in 4 KB 

blocks. 

The matching function used in the preceding table can become burdensome if 
the number of entries to be compared becomes large. Therefore the number of 
entries can be limited to a predetermined number that will not create this burden. In 
one embodunent, the number of entries in the table is twenty, although only six 
entries are shown in Fig. 4 for simplicity. Since the number of possible entries that 
might eventually be placed in the table is much larger, the table can be configured as 
a cache memory, with the most likely entries placed in the table and later replaced by 
other, more likely entries as circumstances require. In general, the system can 
initialize table 43 with one or more predetermined destmation buffers for impending 
transfers, so the correct entries will be missing from table 43 only if there are more 
intended transfers than can be contained in table 43 at one tune. Alternately, well- 
known cache replacement schemes can be used to update the contents of table 43. 

With a well-managed replacement scheme, most addresses placed in input 
register 41 will be contained in table 43. For those few that are not, an alternate 
process can be followed. If table 43 is searched and upper portion Ul is not 
contained in table 43, GART 27 can then access table 42 in main memory 24. Table 
42 can contain a much larger number of entries than table 43, and in fact can contain 
all possible entries that might match the contents of Ul. In one embodiment, table 42 
contains thousands of entries, although only seven entries are shown in Fig. 4 for 
sunplicity. Since main memory is usually not configured for a content-addressable 
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search, upper portion Ul can be used as an index into table 42 to locate the table 
entry associated with the particular value of Ul. Various indexing schemes can be 
used, which are well known in the art and are therefore not further described here. 
The table entry identified by the indexing operation can contain one of the translation 
values U201-U207, which is then read into GART 27 and placed into the upper 
portion U2 of output register 44. It can be merged with the original lower portion LI 
to form the desired translated address contained in register 44. 

By following this two-stage operation, most addresses can be translated on the 
fly through table 43, so that the 32-bit address transmitted by a device on the PCI bus 
will be converted into the proper 36-bit address before reaching memory controller 
25, and the PCI device will therefore be able to directly reach the full 64 GB of 
memory space with little or no increase in bus latency. For a small number of 
addresses, table 42 in main memory can be accessed before the address translation 
can be completed, resulting in a delay while main memory is accessed. With a 
properly managed scheme for updating the entries in table 43, this secondary 
operation will happen so seldom that overall throughput will be significantly 
improved over the prior art process of relocating the data in memory after the transfe: 
from the bus to memory is complete. 

By modifying an existing circuit (the GART interface) to expand the address 
range, and using that mterface for a device external to it's original purpose, the 
aforementioned advantages can be implemented without significantly adding new 
circuitry, and with minimal modifications to existing devices. 

Although the bit widths described herem are 32-bit buses /registers with an 
address range of 4 GB, and an additional 4 bits to expand that to 64 GB, other bit 
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widths can be used without departing from the invention. The invention can be 
implemented in circuitry or as a method. The invention can also be implemented as 
instructions stored on a machine-readable medium, which can be read and executed 
by at least one processor to perform the functions described herein. A machine- 
5 readable medium includes any mechanism for storing or transmitting information in a 
form readable by a machine (e.g., a computer). For example, a machine-readable 
medium can include read only memory (ROM); random access memory (RAM); 
magnetic disk storage media; optical storage media; flash memory devices; electrical, 
optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared 
10 signals, digital signals, etc.), and others. 

The foregoing description is intended to be illustrative and not limiting. 
Variations will occur to those of skill in the art. Those variations are intended to be 
included in the invention, which is lunited only by the spirit and scope of the 
appended claims. 



9 
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We claim: 



1 ^ A method, comprising: 

2 providing a first address containing a first number of bits and having an upper 

3 portion and a lower portion; 

4 comparing the upper portion with a plurality of first entries in a first table; 

5 if the upper portion matches a particular one of the plurality of first entries: 

6 selecting a second entry in the first table associated with the particular 

7 one of the plurality of first entries; 

8 combining the second entry with the lower portion to form a first 

9 translated address; and 

10 transmitting the first translated address. 

1 2. The method of claim 1, further comprising: 

2 if the upper portion does not match any of the plurality of first entries in the 

3 first table: 

4 accessing a second table having a plurality of third entries; 

5 indexing the second table with the upper portion to identify a particular 

6 one of the plurality of third entries; 

7 combming the particular one of the plurality of third entries with the 

8 lower portion to form a second translated address; and 

9 transmitting the second translated address. 



10 
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1 3. The method of claim 2, wherein the first table is contained in an input-output 

2 controller and the second table is contained in mam memory. 

1 4. The method of claim 2, wherein transmitting the first and second translated 

2 addresses includes transmitting to a memory controller. 

1 5 . The method of claim 1 , wherein the first table is a translation lookaside 

2 buffer. 

1 6. The method of claim 1 , wherein providing a first address includes providing a 

2 first address from a bus controller. 

1 7. The method of claim 6, wherein the first table is also used to translate 

2 addresses from a graphics controller. 

A method, comprising: 

using a conversion table to translate a first address from a graphics controller 
to a memory; and 

using the conversion table to translate a second address from a bus controller 
to the memory. 

1 9. The method of claun 8, wherein using the conversion table includes using a 

2 translation lookaside buffer. 




2 
3 
4 
5 
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1 10. The method of claim 8, wherein translating the second address includes 

2 translating the second address to a third address having a different number of bits than 

3 the second address. 

1 11. The method of claim 10, wherein translating the first address includes 

2 translating the first address to a fourth address having a same number of bits as the 

3 first address. 

1 12. The method of claim 8, wherein using the conversion table to translate the 

2 second address includes: 

3 comparing a first portion of the second address with entries in a first table; 

4 if the first portion matches a particular one of the entries in the first table, 

5 combining a value associated with the particular one with a second 

6 portion of the second address to form a translated address. 

1 13. The method of claim 12, further comprismg: 

2 if the first portion does not match any of the entries in the first table, 

3 referring to a second table to translate the second address. 

1 14. The method of claim 13, wherein: 

2 comparing includes comparing the first portion of the second address with 

3 entries in a first table in an input-output controller; and 

4 referring to the second table includes referring to the second table in main 

5 memory. 



12 
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1 An apparatus, comprising: 

2 a translation lookaside buffer coupled to an input register and an output 

3 register; 

4 control logic coupled to the translation lookaside buffer, the input register, and 

5 the output register; 

6 wherein the control logic is to compare a first portion of an initial address in 

7 the input register with entries in the translation lookaside buffer; and if 

8 a matching entry is found, to combine a first value associated with the 

9 matching entry with a second portion of the initial address to form a 

10 first translated address and hold the first translated address in the 

11 output register. 

1 16. The apparatus of claun 15, wherein the control logic is further to: 

2 access a table in memory if the matching entry is not found; 

3 find a second value in the table associated with the first portion; 

4 combine the second value with the second portion to form a second translated 

5 address: and 

6 hold the second translated address m the output register. 

1 17. The apparatus of claim 16, wherein: 

2 the control logic includes logic for first and second control flows; 

3 the first control flow is to translate an initial graphics controller address and 

4 does not access the second table; and 

13 
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5 the second control flow is to translate an initial bus controller address and can 

6 access the second table. 

1 18. The apparatus of claim 16, wherein the first and second translated addresses 

2 each have more bits than the initial address . 

1 ji^. A system, including: 

2 a processor; 

3 a memory; 

4 a graphics controller; 

5 a bus controller; 

6 an input-output controller coupled to the processor, memory, graphics 

7 controller and bus controller, the input-output controller including: 

8 a translation lookaside buffer coupled to an input register and an output 

9 register; 

10 control logic coupled to the translation lookaside buffer, the mput 
XI register, and the output register; 

12 wherein the control logic is to compare a first portion of an initial 

13 address in the input register with entries in the translation 

14 lookaside buffer; and if a matching entry is found, to combine a 

15 first value associated with the matching entry with a second 

16 portion of the mitial address to form a first translated address 

17 and hold the first translated address in the output register. 



14 
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20. The system of claim 19, wherein the control logic is further to: 

access a table in memory if the matching entry is not found; 
find a second value in the table associated with the first portion; 
combine the second value with the second portion to form a second translated 
address: and 

hold the second translated address in the output register. 

21 . The system of claun 20, wherein: 

the control logic mcludes logic for first and second control flows; 

the first control flow is to translate an mitial graphics controller address and 

does not access the second table; and 
the second control flow is to translate an mitial bus controller address and can 

access the second table. 

22. The system of claim 20, wherein the first and second translated addresses each 
have more bits than the uiitial address. 




A machine-readable medium havuig stored thereon instructions, which when 



executed by a machine cause said processor to perform: 

reading a first address containing a first number of bits and having an upper 

portion and a lower portion; 
comparing the upper portion with a plurality of first entries in a first table; 
if the upper portion matches a particular one of the plurality of first entries: 
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selecting a second entry in the first table associated with the particular 

one of the plurality of first entries; 
combining the second entry with the lower portion to form a first 

translated address; and 
transmitting the first translated address. 

24. The medium of claim 23 , further comprising: 

if the upper portion does not match any of the plurality of first entries in the 
first table: 

accessing a second table having a plurality of third entries; 

indexing the second table with the upper portion to identify a particular 

one of the plurality of third entries; 
combining the particular one of the plurality of third entries with the 

lower portion to form a second translated address; and 
transmitting the second translated address. 

25. The medium of claim 24, wherein the first table is contained in an input- 
output controller and the second table is contained in mam memory. 

26. The medium of claun 24, wherein transmitting the first and second translated 
addresses includes transmitting to a memory controller. 

27. The medium of claun 23, wherein the first table is a translation lookaside 
buffer. 
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1 28. The medium of claim 23, wherein providing a first address includes providing 

2 a first address from a bus controller. 

1 29. The medium of claim 28, wherein the first table is also used to translate 

2 addresses from a graphics controller. 
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ABSTRACT OF THE DISCLOSURE 

An address translation apparatus and method that can convert a limited-range 
memory address from a peripheral device to an expanded-range memory address on 

5 the fly. The invention can expand the limited address capability of a peripheral bus, 
such as a PCI bus with a 4 GB address range, to a much larger address capability, 
such as a 64 GB address range. This conversion can be performed on the fly by 
hardware, so that no appreciable delay in transfer time is created. The conversion 
can be performed by adding features to a conventional graphics controller interface, 

10 thus minimizing the impact on circuit complexity and system cost. 
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